Summary: Total Attendance by Type

theme_park  |> 
    group_by(Year, Type) |> 
    mutate(
      Attendance = Attendance / 100000
    ) |> 
    summarise(sum = sum(Attendance)) |> 
    arrange(Type) |> 
    pivot_wider(
      names_from = Type,
      values_from = sum
    ) |> 
    knitr::kable(digits = 3, caption = c("Summary of Attendance for Three Types of Facilities From 2019 to 2022"))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
Summary of Attendance for Three Types of Facilities From 2019 to 2022
Year Amusement/Theme Park Museum Water Park
2019 37996.4 20100.8 5898.9
2020 13031.1 4664.5 2313.5
2021 22463.7 6459.0 3473.5
2022 21280.8 11603.3 4678.3
theme_park |> 
  group_by(Year) |> 
  plot_ly(y = ~Attendance, color = ~Year, type = "box", colors = "viridis")
 theme_park|> 
  group_by(Region, Year)  |> 
  summarize(attend_sum = mean(Attendance)) |> 
  plot_ly(x = ~Year, y = ~attend_sum, color = ~Region,
          type = "scatter", mode = 'point', colors = "viridis")
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.

ANOVA TEST BY TYPE

\[H_0: \mu_{\text{Amusement/Theme Park}} = \mu_{\text{Water Park}} = \mu_{\text{Museum}} ~~ \text{vs} ~~ H_1: \text{at least two means are not equal}\]

anova_1 = aov(Attendance ~ Type, data = theme_park)

summary(anova_1)
##              Df    Sum Sq   Mean Sq F value Pr(>F)    
## Type          2 1.127e+17 5.635e+16   105.3 <2e-16 ***
## Residuals   737 3.944e+17 5.351e+14                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of less than 2e-16, we would reject the null hypothesis. We have evidence that at least two of the means are not equal. Meaning the mean attendance among type groups is different for at least two groups.